{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 5. Markov decision processes\n",
    "\n",
    "$$ B(V)(s) = R(s)+\\gamma \\max_{a\\in A}\\sum_{s'\\in S}P_{sa}(s')V(s')$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## (a)\n",
    "Let $s^* = \\arg\\max_{s'\\in S}|V_1(s')-V_2(s')|$. This means that for all $s'\\in S$ we have \n",
    "$$ -|V_1(s^*)-V_2(s^*)|\\leq V_1(s')-V_2(s') \\leq |V_1(s^*)-V_2(s^*)|.$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Therefore for any $s\\in S$ we have \n",
    "$$\\begin{align*}\n",
    "B(V_1)(s) - B(V_2)(s) &= \\gamma \\max_{a\\in A}\\sum_{s'\\in S}P_{sa}(s')V_1(s') -\\gamma\\max_{a\\in A}\\sum_{s'\\in S}P_{sa}(s')V_2(s')\\\\\n",
    "&\\leq \\gamma \\max_{a\\in A}\\sum_{s'\\in S}P_{sa}(s')\\left(|V_1(s^*)-V_2(s^*)| + V_2(s')\\right) -\\gamma\\max_{a\\in A}\\sum_{s'\\in S}P_{sa}(s')V_2(s')\\\\\n",
    "&=\\gamma |V_1(s^*)-V_2(s^*)| = \\gamma\\|V_1-V_2\\|_\\infty,\n",
    "\\end{align*}$$\n",
    "where we made use of $\\sum_{s'\\in S}P_{sa}(s')=1$ for all $a\\in A$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Similarly one can show \n",
    "$$-\\gamma\\|V_1-V_2\\|_\\infty \\leq  B(V_1)(s) - B(V_2)(s)$$\n",
    "for all $s\\in S$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It follows \n",
    "$$ |B(V_1)(s) - B(V_2)(s)|\\leq \\gamma\\|V_1-V_2\\|_\\infty$$\n",
    "for all $s$, which means \n",
    "$$\\|B(V_1)-B(V_2)\\|_\\infty \\leq  \\gamma\\|V_1-V_2\\|_\\infty.$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## (b)\n",
    "\n",
    "If $V_1$ and $V_2$ are fixed points of $B$, then\n",
    "$$\\|V_1-V_2\\|_\\infty = \\|B(V_1)-B(V_2)\\|_\\infty \\leq \\gamma\\|V_1-V_2\\|,$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "from which \n",
    "$$(1-\\gamma) \\|V_1-V_2\\|_\\infty  \\leq 0 $$\n",
    "follows."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From $\\|V_1-V_2\\|\\geq 0$ and  $1-\\gamma >0$  we can then conclude $\\|V_1-V_2\\|_\\infty = 0$, which implies $V_1=V_2$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Remark:** The existence and uniqueness of a fixed point of $B$ follow from the Banach fixed point theorem."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
